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[57] ABSTRACT 

A system (10) for analyzing a data file containing a plurality 
of data records with each data record containing a plurality 
of parameters is provided. The system (10) includes an input 
(40) for receiving the data file and a data processor (32) 
having at least one of several data processing fiinctiotis. 
These data processing functions include, for example, a 
segmentation function (34) for segmenting the data records 
into a plurality of segments based on the parameters. The 
data processing functions also include a clustering function 
(36) for clustering the data records into a plurality of clusters 
containing data records having similar parameters. A pre- 
diction function (38) for predicting expected future results 
from the parameters in the data records may also be provided 
with the data processor (32). 
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DATA ANALYSIS SYSTEM AND METHOD Yet another aspect of the present invention provides a 

method for analyzing a data file containing a plurality of data 

TECHNICAL FIELD OF THE INVENTION records, each data record containing a plurality of param- 

, . eters. The method further includes the steps of inputting the 

This mvention relates in general to the field of data ^ data file and processing the data file. Processing the data file 

analysis, and more particularly to a stalisucal analysis sys- ^^^j^^^^ ^^^^ segmenting the data records into a 

tem and method for analyzmg data. p^^^^^^y of segments based on the parameters, clustering the 

BACKGROUND OF THE INVENTION ^^^^ records into a plurality of clusters containing data 

records having similar parameters, and predicting expected 

In recent years advancements in technology have reduced lo future results from the parameters in the data records, 

the cost of computers to the point where nearly every event The present invention provides several technical advan- 

in one's day is recorded by a computer. Events recorded by lages. One technical advantage of the present invention is 

computer are numerous and include, for example, every that it provides a user-friendly computer system and method 

transaction made by an individual. Computers store the data for performing statistical analysis on the information within 

associated with the transactions they process and this results 15 a database. 

in sometimes large database(s) of information. Another technical advantage of the present invention is 

The problem, therefore, arises of how to make eflScient that it provides several statistical analysis tools within a 

use of the tremendous amount of information in these single computer system. Each tool may be used to perform 

database(s). When the number of records in a database rises statistical analysis on the information within a database, 

to a sufficiently large level, simply sorting the information in Additionally, the results of the analysis from several tools 

the database provides no meaningful results. While statisti- may be combined for enhanced statistical data analysis, 

cal analysis of the records in a database may yield useful yet another technical advantage of the present invention 

information, such analysis generally requires that persons is that it may be used to identify complex patterns and 

with advanced training in math or computer science perform relationships within large quantities of information. By 

the analysis and understand the results of the analysis. ^5 defining these patterns and relationships in, for example. 

Additionally, translation of the statistical analysis of the customer information, targeted marketing or promotion 

information in a large database into a form that may be activities may be developed. 

useful for such activities as marketing is also difficult. Such ^n additional technical advantage of the present invention 

a situation may prevent the effective use of the information ^^^^ ^e used in developing a marketing program for 

m a database and preclude the use of a possible valuable 30 identifying customers that are most likely to respond to the 

resource. marketing program. Moreover, it may be used to profile 

SUMMARY OF THE INVENTION "^TV ^TV° '"'^5^ socio-demographic or behav- 

loral characteristics withm the customer groups. It also 

In accordance with the present invention, a data analysis 33 provides for identifying significant associations between 

system and method are provided that substantially eliminate customer behavior, hfestyle, or attitudinal features, and may 

or reduce disadvantages and problems associated with pre- be used to identify significant associations between cus- 

viously developed data analysis tools. tomer purchase preferences. 

One aspect of the present invention provides a system for Another technical advantage of the present invention is 

analyzing a data file containing a plurality of data records 40 ^^at it provides for segmenting records into logical groups. 

with each data record containing a plurahty of parameters. Yet another technical advantage of the present invention 

The system includes an input for receiving the data file and is that it provides for clustering records into statistically 

a data processor having at least one of several data process- significant groups. 

ing functions. These data processing functions include, for Yet another technical advantage of the present invention 

example, a segmentation function for segmenting the data 45 is that it may be used to predict customer or potential 

records in to a plurality of segments based on the parameters. customer behavior, including, for example, propensity to 

The data processing functions also include a clustering respond to direct mail or telemarketing, product preference, 

function for clustering the data records into a plurahty of profitability, credit risk, and probability of attrition. The 

clusters containing data records having similar parameters. present invention also provides a technical advantage of 

The clustering function can also generate cluster maps 50 identifying "unusual" customers and potentially fraudulent 

depicting the number of records in each cluster. A prediction behavior by those customers. 

function for predicting expected future results from the riDicc i^ccz-niDT^r^KT -iT-m i-^ti A^l7lKTr>o 
• fu j * J 1 u J J -.1- .1. BRIEF DESCRIPTION OF THE DRAWINGS 
parameters in the data records may also be provided with the 

data processing function, Fc>r a more complete understanding of the present inven- 

Another aspect of the present invention provides a system 55 advantages thereof, references is now made to the 

for analyzing a data file containing a pluraUty of customer foUowmg descnpUon taken m conjunction with the accom- 

data records, each data record contains a plurality of cus- Panymg drawings m which like reference numbers indicate 

tomer parameters. The system includes an input for receiv- features and wherem: 

ing the data file and a data processor for processing the data 1 shows an exemplary system for data analysis in 

records. The data processor includes a segmentation func- 60 accordance with concepts of the present invention; 

tion for segmenting the customer data records into a plural- FIG. 2 is an exemplary data input window for use with the 

ity of segments based on the parameters. The data processor present invention; 

also includes a clustering function for clustering the cus- FIG. 3 is an exemplary flowchart for rule based segmen- 

tomer data records into a plurahty of customer groups tation in accordance with the present invention; 

having similar parameters. A prediction function for pre- 65 FIG. 4 illustrates a rule based segmentation window in 

dieting customer behavior from the customer data records is accordance with one aspect of the present data analysis 

also provided with the data processor. invention; 
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FIG. 5 shows a rule based segmentation setup window for FIG. 28 illustrates an exemplary post-network graphical 

use with the present invention; analysis window with the neural prediction function of the 

FIG. 6 illustrates an exemplary window for editing a bin present system and method; 

within the rule based segmentation function of the present pjQ 29 Ulustrates a daU merge operation in accordance 
data analysis invention; 5 with the present invendon; 

nG. 7 depicts an exemplary of a parameter distribution 3^ ^^^^^ ^ exemplary data merge window for use 

window includmg histograms available with the rule based ^ ^^^^ accordance with the present invention; 

segmentation mnction m accordance with the present mven- .„ . 

^Qjj. FIG. 31 illustrates a data append operation in accordance 

FIG. 8 is an exemplary flowchart for a neural clustering 10 "^^^ present invention, 

function available with the present invenUon; P^^. 32 depicts an exemplary data append window for use 

FIGS. 9A and 9B iUustrate a clustering process in accor- "^^^ ^ ^^^^ ^PP^°^ ^ accordance with the present invention; 

dance with the present system; FIG. 33 shows an exemplary data paring window for use 

FIG. 10 illustrates an exemplary neural clustering window "^^^ ^ Paring in accordance with the present invention; 
for use with the neural clustering function available with the 

present data analysis system; FIG. 34 illustrates an exemplary data output window for 

FIG. 11 shows an exemplary dialog window for setting up outputting data in accordance with the present invention, 
a neural clustering run in accordance with the present data 

analysis invention; DETAILED DESCRIPTION OF THE 



FIG. 12 depicts an exemplary parameter selection win- 



INVENTION 



dow for use with the neural clustering function available Preferred embodiments of the present invention are illus- 

with the present invention; trated in the FIGURES, like numerals being used to refer to 

FIG. 13 illustrates an exemplary clustering analysis win- like and corresponding parts of the various drawings, 
dow in accordance with the present data analysis invention; ^5 FIG. 1 shows data analysis system 10 embodying con- 

FIG. 14 depicts an exemplary graph options dialog box cepts of the present invention. Data analysis system 10 

available with the neural clustering function of the present preferably includes processor 12, random access memory 

data analysis invention; (RAM) 14, read only memory (ROM) 16, pointing device 

FIG. 15 illustrates an exemplary parameter distribution 18, keyboard 20, and various output device(s). The output 
window with histograms available with the present inven- 30 device(s) for system 10 may include, for example, external 

tion; memory devices such as tape drive 22 and disk drive(s) 24, 

FIGS. 16A and 16B illustrate a multi-layer perception printer 26, and display 28. Data analysis system 10 also 

network and neuron, respectively, used in one embodiment preferably includes modem 30 for making connections to 

of the neural prediction function of the present invention; external communication mediums. Data analysis system 10 

FIG. 17 illustrates an exemplary flowchart for the neural 35 ^ linaited to any particular hardware embodiment and 

prediction function in accordance with the present invention; implemented in one or more computer systems. 

HG. 18 shows an exemplary neural network file specifi- Processor 12 in system 10 is adapted to execute many types 

cation window for use with the neural prediction function of computer instructions m many computer languages for 

the present invention* implementing the functions available data analysis system 

. ' 10 

FIG. 19 depicts an exemplary input file selection dialog 

box for use with the neural prediction function of the present analysis system 10 in FIG. 1 provides an advanced 

invention* statistical analysis tool for analyzing databases containing 

no. 20 shows an exemplary specify parameters window °^^°y different types of data. Although system 10 may be 

for use with the neural prediction function available with the ^""^ ^"^^^^'^ databases contammg a variety of 
present data analysis invention; « mformation, system 10 has been successfully implemented 

FIG. 21 iUustrates an exemplary encode input parameter ^^/^ !? ^ P^^^^'^V ^ analyzing 

dialog box for use with the neural prediction ftinction of the fT^""^^' - ^^^lysis system 10 may provide 

present invention* sigmfacant benefits with its capability to idenUfy complex 

T-i^ -1-1 J • ' 1 . . . , patterns and relationships within large quantities of infor- 

no. 22 depjcts an exemplary run neural netwo^^ mation. To that end, system 10 includes several functions, 

tove'^or prediction function of the present System 10 preferably includes data processor 32 that is 

1 supported by processor 12. Within data processor 32 are 

FIG. 23 shows an exemplary edit network configuration preferably nile based segmentation function 34, neural clus- 

window for use with the neural prediction function of the tering function 36, and neural prediction function 38. Data 

present mvention; processor 32 uses data acquisition and output function 40 

nG. 24 shows an exemplary neural prediction edit run and data management function 42 to receive and manipulate 

window for use with the neural prediction window of the data in performing data analysis. Such data is typicaUy 

present data analysis invention; found in one or more dalabase(s) 44 that may be stored on 

FIG. 25 illustrates an exemplary text neural network tape drive 22 or disk drive(s) 24. 

results window in accordance with the neural prediction Data acquisition and output function 40 is responsible for 

funcuon of the present mvention; receiving data from database(s) 44 and formatting the data 

FIG. 26 shows an exemplary graphical neural network for processing by data processor 32. In one embodiment of 

results window for the neural prediction function of the the present invention, data acquisition and output function 

present invention; 40 receives customer data in a flat ASCII format from 

FIG. 27 depicts an exemplary dialog box for defining a 65 database(s) 44 and converts it into a concise internal binary 

graph's characteristics generated with the neural prediction form for use by data processor 32. Data acquisition and 

function of the present invention; output function 40 preferably includes a data dictionary 
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function that allows for setting up and customizing param- is particularly beneficial in analyzing customer databases 

eter names for all of the parameters in a given database and that include information on the types of products purchased, 

will be described in discussions relating to FIG. 2. frequency of purchase, quantity of purchase, and other 

Data management function 42 of data analysis system 10 6^°''"'' info™ation on customers, e.g., age, gender, marital 
allows for concatenating different sets of records, either 5 ^^^"f-. «=''=-,^ "I^P'^ ^"'^h ^f^^ ^J^% 

different sets of records with the same data fields or differem f^^^'t '^^J'^f^}^ database avaJable from NDL nterna- 

fcur*!. jr^. . c tional Ltd. In further descnbins data analysis system 10, 

sets of fields for the same records. Data management func- ^^^^^^^^ ^ ^^^^ ^^^^^ daUbas^s, but these 

tion 42 allows for prumng out redundant records m a data set references are not intended in a limiting sense as the present 

and converungbmary records mto an appropnate format for ^ata analysis system may be used for analyzing many 
use by other systems. . lo ^^^^^^^ databases of information without departing from 

Data processor 32 preferably includes several functions the spirit and scope of the present invention. One embodi- 
for performing data analysis. Rule based segmentation func- mem of data analysis system 10 is available from Electronic 
tion 34 within data processor 32 preferably provides a Data Systems Corporation under the trade name AcuSTAR. 
mixture of query, segmentation, and statistical analysis As a first step in using data analysis system 10 in FIG. 1, 
capabilities. Rule based segmentation function 34 may pro- a input data set or file may be retrieved from database(s) 44 
vide a flexible and powerfril facility for the investigation of stored in tape drive 22 or disk drive(s) 24. As previously 
data. Rule based segmentation function 34 may provide stated, data acquisition and output function 40 of system 10 
statistical information on each parameter in a data set for all provides the necessary data input capability to convert raw 
records in the data set or for a given record segment. The data in database(s) 44 into a form that can be used by data 
segmentation tool also allows for splitting data into a set of '^^ processor 32. In one embodiment of data analysis system 10, 
hierarchically organized logical groups or tree stmctures. data in database(s) 44 is in ASQI data flat file format. This 
The segmentation process may be controlled by simple rules means that the data is provided in the form: 
specified in "En^ish-like" tests for each branch in the 
hierarchy. The segmentation logic in rule based segmenta- 
tion function 34 is easy to understand and modify and can be ^~ — 

interactively modified for further pruning or further see- Record l a b c d . . . 

/ , , ■ , Record 2 a b c d . . . 

mentmg of the records m a data set to create a structtu-e of Record 3 a b c d . , . 

any degree of complexity. This segmentation capability, 

combined with a statistical analysis capability, provides an _ . . . _ , ... 

efficient, flexible, and interactive system and method for ^° ^^^^ ^^^^^ °^^y presented m typically one of 

analyzing and partitioning large quantities of data. iJneTrlted^foSa?^"^ separated, comma-separated, or 

Data processor 32 within data analysis system 10 also J P ,\ . , . , / \ am ^ . - 

. ; . I J 1 1 . - 1 XT t In mputting the data m database(s) 44, data acquisition 

preferably includes neural clustermg function 36. Neural , f , c • /j / • . . 

^ , ■ c 1 . T ■ . .-.-11 and output function 40 converts mput data in raw text files 

clustermg function 36 clusters records mto statistically sie- . ■ ♦ u- c * t i_ j- * c 

-f. , r • -1 J T-u- /I . .f 35 to an appropriate bmary format. In one embodiment of 

niiicant groups or similar records. This function can identify . An • i • 

*uu *-*- a^ r*i- j i.ljc system 10, fuuction 40 maps numenc columns in a text file 

the charactenstic profiles of the groups and ranks the defin- / ; . j.cit^.ci • .lj. 

. , J j-ir fo parameters in a binary data file. Data files input by data 

mg parameters m accordance with significance and differ- j . . i An . . j 

n 1 . . acquisition and output function 40 may be used by data 

ences from the population average. This capabihty is a -i-i ^ ^ . i • ^n 

£. , J * *• 11 a? ■ . c € . .1 processor 32 of data analysis system 10. 

powerful and computationally efficient form of statistical cir- -> n * ♦ i j * • * • j .le *t- * 

, , . . ix, A c A' ..... 40 FIG. 2 illustrates an exemplary data input window 45 that 

clustermg and provides a method of discovenng discnmi- u i » r j 

. . . 1 r provides a graphical input screen for retrieving and convert- 

natory patterns and associations withm large quantities of T „ iT r j * u / \ aa • * e ♦ .u * u 

•'Z , . r « , , ^ ^ , . • ing a data file from databasefs) 44 mto a format that may be 

unexplored mfonnation. Previously unknown relationships jlj* i- / mr, . 

' ,u . * u -i j* J ^sed by data analysis system 10. Processor 12 generates 

m the data may be uncovered m data and expected relation- ^ ac m ii*u j j lj 

-11 A -1 -*u 1 1 * • window 45, as well as all other windows described 

ships verified quickly and easily with neural clustenng u • <•* i io • * j j u- i 

A * -ir f 1 * • *• hereinafter, on display 28 usmg standard graphical user 

function 36 of the present invention. ^ j & b y 



interface (GUI) protocols and techniques. 



Neural prediction function 38 vdthin data processor 32 piG. 2 also illustrates main toolbar 46 that is generally 

provides the capability to predict future behavior or rela- ^^^i,^i,le with the windows provided by data analysis sys- 

tionships of the items represented by a particular data set. iq. Toolbar 46 provides access to the functions available 

Usmg a statistical machine learning technology that has the 50 ^^thin data analysis system 10 of the present invention. The 

abihty to learn from historical behavior stored m a databa^ ^^^^^^^ ^^^^ ^^^Ibar 46, as well as all other buttons 

neural prediction function 38 can be used to predict many described hereinafter, may be selected and activated using 

aspects for behavior for which records or historical behavior ^j^^^ard "select and click'' techniques with pointing device 

are held m a data set. display 28. The number and design of the buttons 

While data processor 32 shown in HG. 1 includes rule 55 shown for toolbar 46 in FIG. 2 are exemplary of the buttons 

based segmentation function 34, neural clustering function that may be included with toolbar 46. The numbered design 

36, and neural prediction function 38 the present data of the buttons may be modified as necessary without depart- 

analysis system and method is not limited to these functions. ing from the spirit and scope of the present invention. 

Some embodiments of the present invention may include Main toolbar 46 includes data input button 47 that may be 

only one of these functions while other embodiments may go selected to access the data input capability of data acquisi- 

include additional functions not shown in FIG. 2. The tion and output function 40. Rule base segmentation button 

number and type of data processing functions in system 10 48 may be selected to access rule based segmentation 

may therefore be varied without departing from the spirit function 34. Qustering buttons 49 may be selected to access 

and scope of the present invention. neural clustering function 36. Prediction buttons 50 may be 

Data analysis system 10 may be used to analyze 65 selected to access neural prediction ftinction 38. Data merge 

database(s) of information of many types and is not limited button 51 may be selected to access a data merge capability 

to any particular database content. Data analysis system 10 within data management function 42. Data append button 52 



11/11/04, EAST version: 2.0.1.4 



6,026,397 

7 8 

provides access to a data append function within data FIG. 3 shows an exemplary flowchart for rule based 
management function 42. Data paring button 53 may be segmentation function 34 of data analysis system 10. As 
selected to access a data paring capability within data previously noted, rule based segmentation function 34 
management function 42. Data output button 54 may be allows for applying flexible rule based segmentation tech- 
selected to output a data file via data acquisition and output 5 njques to organize data hierarchically. When operating on 
function 40 once the results of the processing of a daU file customer databases, rule based segmentation function 34 
with data processor 32 are complete. Also, within main provides for market segmentation and customer scoring. By 
toolbar 46 is exit button 55, which may be selected at any j ^ ^^y^^ j^^^^ ^^^^ data, which may be 
time to exit data analys^ system 10 ^^^^^ ^^^^ ^ ^^^^^^ 

Data mput wmdow 45 in MG. 2 also mcludes data input ^ * c* *• *u j * a 

* iu ff/ lu • 1 J • . • . c 10 segments. Statistics on the data, e.g., occupancy, mean, and 

toolbar 56. Toolbar 56 mcludes mitiate new input configu- . . , 1 r l \ 

ration button 57, open exisUng input data configuration and maximum values for each segment may then 

button 58, save data input configuration 59, save data examined, histograms may be plotted, and these results 

configuration as button 60, run data input function button 61, analyzed. 

stop data input button 62, and close data input function embodiment of rule based segmentation function 34 

button 63. The buttons in toolbar 56 provide a simple ^5 in accordance with the present invention has three main 

method for entering standard commands when inputting a functions. The first function provides for subdivision of 

data file. database records, e.g., customer population into a hierarchy 

Data input window 45 in FIG. 2 preferably includes input of logical segments. The second function provides for iden- 

file section 64. Input file section 64 includes input file name tification of high level statistics for each segment, and the 

field 64a and browse button 64i) that may be selected to view 20 third provides for identifying detailed statistical distribu- 

a hst of available input files. Input file section 64 also tions for each segment. 

preferably includes start line field 64c that specifies the line In FIG. 3 rule based segmentation function 34 begins at 
in the input file where data begins. Dehmiter field 64d step 68 whenever, for example, rule based segmentation 
identifies the type of delimiter used in the input file, and button 48 on main toolbar 46 is selected. At step 69 the data 
parameters field 64^ indicates the number of parameters in 25 to be analyzed with rule based segmentation function 34 is 
the input file By selecting confine button 64/ in input file appropriately formatted as described in discussions relating 
section 64, data acquisition and output function 40 will to FIG. 2. Proceeding to step 70 an initial analysis of the data 
attempt to read the fomiat of the selected data file to ^e accompHshed that provides basic statistics on the 
complete the mformation in start hne field 64c, delimiter c 1 .jjj -.- 
field 64rf, and parameters field 64e. Selecting view button ^^'^ "^^l^ding, for example a mean, standard deviation, 
64g in input file section 64 displays in message section 65 ^° maximum values for each field m the data 
of window 45 the first several records in the input data file. ^' «^ . , . ^ . ^ . ^ ^ 
An example of the type of data to be provided in message ^^^^^ ^^^P ^ *^^sed segmentaUon function 34 as 
section 65 is shown in FIG. 2. depicted in FIG. 3, initial segmentation of the data file may 
Data input window 45 also includes parameter name accomplished. For example, when dealing with a cus- 
section 66 that allows for associating a textual identifier, i.e., ^5 tomer database it may be desirable to segment customer 
a name, with each column of data in the input data file. In records into married and non-married customers. To do this, 
one embodiment of the present invention, a default name is a user simply defines a two branch segment to be created 
given to each column of data in the form of PARAMOON, from the total population of records, the first branch on this 
where N is the Nth data column. Parameter names can be segment being defined by the rule "MARRIED=0" with the 
generated in at least two ways, either from a parameter name 40 second segment having all remaining non-married custom- 
file, which includes a name for each data parameter, or via ers. 

keyboard 20. An example of this is illustrated in parameter Once the initial segmentation at step 71 is complete, 

name list section 660 show in FIG. 2. In order to use a file expanding the initial segmentation is possible. This is a 

for the names of the parameters in the input file, use file simple process with rule based segmentation function 34 of 

checkbox 66b is selected, and by selecting browse button 45 data analysis system 10 of the present invention and is 

66c a dialogue window will be produced for selecting a file accomplished by moving to any point in an existing segment 

from a preexisting list of files. The selected file for the and either inserting or removing branches on that segmen- 

parameter names for the input file appears in file name field tation level to a further level. Examples of further segmen- 

66<i tation in accordance with step 72 in FIG: 3 will be described 

Alternatively, parameter names for the input file may be 50 in discussions relating to FIG. 4. Once the fiirther segmen- 

created with keyboard 20. In this alternate method, each tation at step 72 is complete, statistics on any parameter for 

default parameter name is selected in parameter name list all segments or for comparing the parameters distribution 

section 66fl and a new name may be entered in new name between any two segments may be viewed on display 28. 

field 66e with keyboard 20 and accepted by selecting replace Once the desired segmentation is complete, rule based 

button 66/ 55 segmentation function 34 is exited at step 73, 

Data input window 45 in FIG, 2 also preferably includes Rule based segmentation function 34 may be used for 

processing range section 67 that allows for specifying several purposes that include, for example, selectively par- 

whether the whole or only a portion of the input file is to be titioning the data for more manageable analyses, examining 

processed. To process the entire input file whole input file trends in the data, gaining an intuitive feel for the content of 

checkbox 67fl is selected. Alternatively, if only a portion of 60 the data, excluding rogue samples in the data from being 

the input file is to be processed, a start record may be input included in any predictive or clustering models, examining 

to start field 67b and an end record may be input to end field the results of neural clustering function 36, e.g., occupancy 

67c. To process to the end of the input file, end file checkbox and profile of a particular set of clusters, and examining the 

67 d may be selected. output and distribution from neural prediction fimction 38. 

Using the input function of data acquisition and output 65 FIG. 4 shows an exemplary rule based segmentation 

function 40, a data file in database(s) 44 can be processed window 74. Window 74 preferably includes toolbar 80 

appropriately for further use with data analysis system 10. containing several buttons for providing predetermined 
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commands within rule based segmentation function 34. Rule Several actions may be performed on bins in rule based 

based segmentation toolbar 80 includes initiate new input segmentation function 34. For example, a name may be 

configuration button 82, open existing configuration button defined for each bin using a string of characters. A logical 

84, save data input configuration button 86, and save data test for the bin may be specified. A bin may be saved to a file 

input configuration as button 88. 5 for use in later analysis. Also, bins at the same level may be 

Rule based segmentation toolbar 80 includes run button ^dded together, and another bin at the same level may be 

90 and stop button 92. Toolbar 80 also preferably includes ^^^^ 5^^,^ ^^rent bin. Bins may be segmented 

disp ay summary file button 94 that may be selected to f^^her and a bin may be deleted along with all of its 

display a summary on a completed se^entaUon run on a ^dependent bins," except for the remainder bin. 

given data set. Histogram plotter button 96 in toolbar 80 may / - i»j iuj j 

be selected to prepare histograms for the segmentation . As Previously noted nile based s^^^ 

configuration on a given data St. Also, rule based segmen- "^^^^^^^ segmentaUon results 100 showmg the results of a 

tation toolbar 80 includes exit button 98, which may be segmentation configuration applied to a particular daU set. 

selected at any time to exit rule based segmentation function example segmentation shown in FIG. 4 mcludes 14 

34 bins. Bin 0 is the "All Records" bin and contains all 

Rule based segmentation window 74 as shown in FIG. 4 members of the data set. All segmentation of the data set is 

also includes segmentation results section 100 providing an therefore performed on Bin 0. The levels of segmentation 

example of a segmentation run on a data file. Section 100 are indicated in segmentation results 100 by bin number and 

includes information on the segment number (Bin), segment test name with bins at the same level being tabbed over the 

test (Test Name), size of the segment (Size), percent of the same distance under Bin 0 All Records. Therefore, in the 

total segment (%Total) percent of the Parent Segment 20 example shown in FIG. 4, the All Records bin was initially 

(%Parent), mean for the segment (Mean), the segment's segmented into Bin 1 "Male" and Bin 8 "Female". From 

standard deviation (SD), minimum value for the segment these initial segmentation levels, the male and female bins 

(Min), and the maximum value for the segment (Max). In the were further segmented into "Unmarried" and "Married" 

example shown in FIG. 4, the data file includes 20,000 bins, which in turn were each further segmented into 

records as indicated by the Size for All Records Bin 0. 25 "Young" and "Old" bins. 

A file may be selected for segmentation with rule based For each bin in window 74 the number of members in that 

segmentation 34 by selecting setup button 102 in window bin is shown in the "Size" column. The percentage the 

74. Selecting setup button 102 activates rule based segmen- number of members in a particular bin represents with 

tation setup window 104 shown in FIG. 5. * respect to the total number of records is shown in the 

FIG. 5 shows an exemplary rule based segmentation setup 30 "%Totar column. For those bins having "parent-bins," i.e., 

window 104 for selecting a file for processing in rule based all bins in the example of FIG. 4 except Bio 0 All Records, 

segmentation function 34. Window 104 allows for specify- the percentage the bin represents with respect to its parent 
ing the input file for processing in input file field 106. A bin is shown in the "%Parent" column. For each bin the 

listing of available data files may be accessed by selecting "Mean", standard deviation ("SD"), and minimum (MIN) 

browse button 106a. The portion or range of the file to be 35 and maximum (MAX) values are provided in segmentation 

processed may be specified in range input field 108. Select results 100. 

button 109 may be chosen to select a range within the input Rule base segmentation window 74 also includes param- 

file to be processed. eler pop-up field 114, which may be used to select the 

Window 104 also preferably includes summary file parameter that segmentation results 100 is based on. 

checkbox 110, which may be selected to generate a summary 40 Therefore, in the example of FIG. 4 the parameter "AGE" 

file for the segmentation process. The summary file name and its attendant statistics based on the specified segmenta- 

may be input to summary file name field UOa, and a list of tion are shown in segmentation results 100. Parameter 

potential summary file names may be viewed by selecting pop-up field 114 may be used to select the other parameters 

browse button 1106. Window 104 also includes produce in the data set for performing a new segmentation on the data 

output data files checkbox 111, which when selected, causes 45 set. 

rule based segmentation function 34 to create output data Rule based segmentation window 74 also preferably 

files containing the results of the segmentation process. As includes action buttons 116 that may be used to perform 

shown in the example of FIG. 5, the file in input file field 106 predetermined actions on the particular bin or bins of a 

is "C:\AB C\DATA.BDT," and the range for scanning this segmentation. Once an existing bin is selected, selecting add 

file in range input field 108 is the "Whole file." 50 button 118 allows adding another bin after the selected bin 

Dialogue box 104 in FIG. 5 also includes bin type at the same level of the selected bin. Insert button 120 

selection 112. In segmenting the data in a data file with rule performs the same function as add button 118 except that the 

based segmentation function 34, the records in the data file new bin is introduced before the selected bin at the same 

are sorted into segments or bins. These bins may be exclu- level as the selected bin. Selecting segment button 122 

sive or non-exclusive. The default bin type calls for exclu- 55 creates a "child-bin" at the next level of indentation firom the 

sive bins. Exclusive bins are completely separate in terms of selected bin. Activating remove button 126 removes the 

membership. Therefore, with exclusive bins, each member selected bin. 

of a data file can only be in one bin. Alternatively, when Action buttons 116 in rule based segmentation window 74 

using non-exclusive bins, individual members can occupy also include edit button 124, which may be used to edit a 

one or more bins. 60 selected bin or segment. The add, insert, segment, and edit 

In rule based segmentation function 34, each level of bin operations are all very similar, and an example window 

segmentation may include an arbitrary number of bins. This for editing a selected bin in response to the selection of edit 

is achieved by defining a logical test for each bin, except for button 124 is shown in FIG. 6 and is representative of the 

the remainder bin, which contains all members that do not windows provided when add button 128, insert button 120, 

fall within the other specified bins. The remainder bin can be 65 or segment button 122 are selected, 

renamed at any lime, saved to a file, or segmented further, FIG. 6 shows an exemplary edit bin window 128 that may 

however, preferably should not be deleted. be used for editing a bin used with rule based segmentation 
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function 34. Window 128 includes bin name field 130 as age x> the standard deviation (a) and the minimum (x) and 
well as parent bin name field 132. Therefore, in the example maximum (x) value for the data set. In the example shown 
window shown in FIG. 6, the bin being edited is the bin in FIG. 7, upper histogram 146 illustrates the distribution of 
named "ages 25-35" that is a child -bin of a bin named "high members within a 20,000 member class having a minimum 
income." Edit bin window 128 in FIG. 6 also preferably 5 age of 18 and a maximum age of 94. The lower histogram 
includes bin test field 133, which shows the test for the bin 148 illustrates the distribution for the same data set but 
being edited. Appropriately, for the bin named "age 25-35" having a minimum age of 60 and a maximum age of 94. This 
the test in field 133 is "AGE>-25 & AGE<-35." A test in different minimum age accounts for the difference between 
test field 133 may be validated by selecting validate test histograms 146 and 148 and histogram informations 154 and 
button 133a. 156. 

Edit bin window 128 also includes available parameters Histograms 146 and 148 in parameter distribution win- 
section 134, which shows all the available parameters that dow 144 can be plotted as actual values or percentage values 
may be used in defining or editing a bin. Once the edits to by an appropriate selection in values regions 160. Also, 
a bin are complete, OK button 135 may be selected to accept histograms 146 and 148 may be plotted with cumulative or 
the specified edit. Alternatively, the edits to a particular bin non-cumulative distribution by an appropriate selection in 
may be canceled at any time by selecting cancel button 136. distribution regions 162. 

Also, edit bin window 128 includes output file checkbox Histograms 146 and 148 in window 144 may be plotted 
138, which may be selected to output the contents of a and printed using color coding as well as shading as shown 
particular bin to a file. The file name to which the bin's in FIG. 7. Histograms 146 and 148 can be saved as bit maps 
contents are to be output may be specified in output file so that they may be imported into other programs, e.g., word 
name field 140. Alternatively, browse button 142 provides a 20 processing, graphics, or spreadsheet programs, 
predetermined list of files to which the segment's contents ^ provides an exemplary neural clustering function 

may be stored. 36 i° accordance with the present invention. Data analysis 

In specifying the test for a bin using rule based segmen- system 10 preferably uses unsupervised learning neural 
tation fiinction 34 of data analysis system 10, standard logic network techniques to cluster data from a data file or from 
may be used. See, for example, the test shown in test field 25 specific segments in a data file to identify groups with 
133 in FIG. 6. Table 1 below illustrates and example similar characteristics. Neural clustering function 36 also 
operator set for logical and relational operators for devel- provides a generic profiling capability. Neural clustering 
oping tests for segmentation bins. Operators are shown in function 36 is different from the segmentation provided with 
descending precedential order. rule based segmentation function 34 in that the clustering is 

30 based entirely on the statistics of the data rather than 
TABLE 1 specified logic. This contrasting analysis of a data set 

provides for alternate views of the data set. The results of 
neural clustering function 36 may be displayed graphically 
on display 28 in an easy to understand format. Information 
35 on the distribution of parameters for records in a particular 
cluster may be viewed and relationships between parameters 
may be identified. Neural clustering function 36 may also be 
used to identify unusual data so that it may be examined in 
more detail, 

40 Neural clustering function 36 begins at step 164 when, for 
example, clustering buttons 49 in main toolbar 46 described 
in discussions relating to FIG, 2 are selected. At step 166 a 
cluster setup is defined. This may be accomplished by 
FIG. 7 shows an exemplary parameter distribution win- defining the fields in the clustering process. For example, 
dow 144 available with rule based segmentation function 34 45 when the data file contains customer information, the cluster 
of data analysis system 10. Once the desired bins for a parameters may be age, gender, income, or whatever param- 
particular segmentation have been established and applied to eier is desirable. Also, the maximum number of clusters 
a data set, a histogram plot of the data may be generated by should be defined at step 166. The number of clusters is 
rule based segmentation function 34 as shown in FIG. 7. By preferably a square number, e.g., 3x3, 4x4, 5x5, .... 
selecting histogram plot button 96 in toolbar 80 of rule based 50 The next step in neural clustering fiinction 36, as repre- 
segmentation window 74, parameter distribution window sented in FIG. 8, involves initializing the cluster map at step 
144 shown in FIG. 7 is provided. 168. Before the actual clustering process may commence, 

FIG. 7 illustrates parameter distribution window 144 neural clustering fiinction 36 preferably automatically pre- 
having two histogram regions, including histogram region pares a random set of "generic records" and assigns one 
146 and histogram region 148. Window 144 also preferably 55 record to represent each cluster. Continuing the customer 
includes parameter information 150 that includes parameter data example, these "generic records" would be "generic 
list 152, which provides a list of parameters that may be customers." Each "generic record" has a set of cluster 
plotted in the histograms of window 144 and that indicate by parameters that are randomly generated. Neural clustering 
shading the name of the parameter that has been selected for function 36 generates the clusters randomly in a two- 
depiction in histograms 146 and 148. In the example of FIG. 60 dimensional grid or cluster map as shown in FIG. 9A that is 
7, the parameter "AGE" has been selected for plotting on the an 8x8 cluster map. Cluster 1 and Quster 2 have been 
histograms, randomly identified in cluster map 170 for a particular 

Parameter information region 150 also includes upper (undefined) set of parameters. It is noted that neural clus- 
histogram information 154, lower histogram information tering fiinction 36 is not hmited to using an 8x8 map as 
156, and all records information 158. Histogram informa- 65 shown in FIGS. 9A and 9B. Maps of other sizes may be used 
tions 154 and 156 and all records information 158 in turn without departing from the spirit and scope of the present 
include information on the number (t|) of records the aver- invention. 
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Returniag to neural clustering function 36 in FIG. 8, at 
step 172 the clustering process is started. In starting the 
clustering process, neural clustering function 36 takes the 
first record in the database, identifies the generic cluster that 
is most similar to that record and then modifies that cluster 
to provide a closer match to the actual record. In one 
embodiment of neural clustering function 36, an Euclidian 
distance is used as the matching metric between an input 
record and a "generic record" cluster. Neural clustering 
function 36 also modifies the clusters in the immediate 
neighborhood to the chosen cluster to make them more 
similar to the chosen cluster. This process is illustrated in 
FIG, 9B in cluster map 176. 

Returning to FIG. 8, at step 174 in neural clustering 
function 36 the clustering* process is completed. In doing so, 
function 36 takes each successive record, identifies the best 
match in the cluster map, and, as before, modifies the cluster 
in its immediate neighborhood, as shown in FIG. 9B. This 
process may be repeated a niimber of times on the total data 
set with the degree of modification and size of neighborhood 
modified in the cluster map gradually reducing as the 
training proceeds. The result of this is that neural clustering 
function 36 initially coarsely separates the customers into 
major groups in different parts of the cluster map. Function 
34 then progressively defines these major groups into further 
groups having more subtle distinct characteristics. 

Neural clustering function 36 may also have a self- 
organizing capability in that, at the end of the clustering 
process, two clusters next to each other on a cluster map will 
have a high degree of similarity, while clusters on totally 
different parts of the map will be quite different. It should 
also be noted that a cluster map has no edges, and that the 
cluster cell on the top edge of the map is actually adjacent 
to one on the bottom edge, and the same is true for cells on 
the left and right edges of the cluster map. The cluster map 
is therefore a torroidal surface. 

Returning to FIG. 8, the next step in neural clustering 
function 36 is step 178 where the results of the clustering 
process are analyzed. Once the clustering process is com- 
plete at step 174 a variety of analysis results are available 
with neural clustering function 36. One type of result 
available is cluster occupancy. This indicates the number of 
members in each cluster. This may be presented by color 
coding the cluster map with, for example, red clusters 
denoting high occupancy (large number of members) and 
blue clusters having low occupancy (smaU number of 
members). 

Another type of result available with neural clustering 
function 36 is the mean, standard deviation, and minimum 
and maximum values for any parameter for each cluster. 
This may also be accomplished by color coding a cluster 
map. By selecting different parameters for display on a 
color-coded cluster map, changes in the color of the cluster 
map as the selected parameter changes allows for visualizing 
the distribution of members for each parameter. 

The next level of results preferably available with neural 
clustering function 36 allows for viewing the mean, mini- 
mum and maximum values for any selected parameter for a 
single cluster and to compare these values with population 
averages. The parameters may also be ranked in terms of 
mean value. Also, a view of the complete distribution of any 
parameter for any cluster may be provided, and distributions 
between clusters or between one cluster and the total popu- 
lation may be compared. 

Once analyze step 178 is complete, the clustering process 
may be refined at step 180. In the customer database 
example, refining the analysis of the clusters may provide 



information about the customers, the major customer 
groups, the profiles of those customers, the differentiating 
factors of the groups, and any significant associations 
between the defining factors of each group. Based on this it 
5 may be desirable to recluster the data, possibly using a 
different set of variables, a coarser or finer map, or using a 
subset of the original data. 

The next step in neural clustering fimction 36 as repre- 
sented in FIG. 8 is tagging step 182, One of the goals of 

10 neural clustering function 36 is to produce a set of statistical 
significant record groups, each with an intuitively sound 
profile but with exploitable behavioral characteristics. In the 
customer database example, clustering on demographic 
information such as age, income, gender, occupation, time in 

15 residence, etc., will produce a set of customers with similar 
demographic profiles. It is invariably the case, however, that 
these customer groups will have different lifestyle, attitude, 
and behavioral characteristics. In particular, certain clusters 
may have significantly higher than average propensity to 

20 respond to direct marketing material for a specific product. 
This can be exploited by ranking the clusters in terms of 
"propensity to respond" in targeting individuals whose pro- 
files match the highest scoring groups. This may be accom- 
plished at tagging step 182 of neural clustering function 36. 

25 Once tagging step 182 is complete, neural clustering func- 
tion 36 may be exited at step 183, 

FIG. 10 shows an exemplary run neural clustering win- 
dow 184 for use with neural clustering function 36 in 
accordance with the present invention. Window 184 prefer- 

30 ably includes toolbar 186, data and configurations section 
188, and detailed run information section 190. Toolbar 186 
includes initiate firesh network configuration button 192, 
retrieve network configuration button 194, save data input 
configuration button 196, and save data input configuration 

35 as button 198. Toolbar 186 also includes run clustering 
process button 200 and stop clustering process button 202. 
Additionally, toolbar 186 includes exit button 204 for exiting 
neural clustering function 36. 

Data and configuration section 188 in run neural cluster- 

40 ing window 184 provides information on each clustering 
run. An existing configuration file may be loaded or modi- 
fied using retrieve network configuration button 194 in 
toolbar 186, In the example show in FIG. 10, file 
"DEF.CNC" is being processed with neural clustering func- 

45 tion 36. Data and configurations section 188 indicates that 
input file "DATA.BDT*' is being used for the clustering 
process, that the first "Rim" has been completed and that the 
clustering mode is clustering ("C") (training) mode. Also, in 
the example of FIG. 10 the "Input Weights for Training** is 

50 "none" indicating that the run is not a continuation of 
previous clustering sessions. Data and configurations section 
188 also includes information on the "Weights File" for a 
given clustering process, and in the example shown in FIG. 
10 the "Weights File" is "DEF.BKW." This file is used for 

55 weighting piuposes during a clustering run, 

A new clustering configuration may be created with run 
neural clustering window 184 using add button 206, edit 
button 208, remove button 210, and copy button 212 in data 
and configurations section 188. If an existing configuration 

60 is not used for generating a new run, then an entirely new 
clustering specification must be generated. Neural clustering 
function 36 provides an appropriate window for setting up a 
new run via add button 206. 
FIG. 11 illustrates an exemplary clustering setup window 

65 214 for setting up clustering parameters in accordance with 
neural clustering function 36. Clustering setup window 214 
preferably includes files and ranges section 216, output 
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parameter section 218, parameter oormalization section 220, 
parameter selection section 222, and training setup section 
224. 

Inputs to the various fields in files and ranges section 216 
are available through browse and select buttons 226. Each 
browse button pulls up a dialogue box having a list of files 
for selection. The select button pulls up a dialogue box that 
offers choices as to the range of processing for a file. The 
processing range may be either the entire file or a specified 
start and end location. 

Output parameter section 218 includes recall only check- 
box 228 and produce tagged data file checkbox 230. When 
recall only checkbox 228 is selected, neural clustering 
function 36 runs in "recall mode", i.e., using a cluster model 
created earlier. Anew set of data records may then be applied 
to an existing cluster map to see how the records correlate 
with an existing model. Additionally, weights in checkbox 
229 in files and ranges section 216 should be checked when 
the clustering process is run in recaU mode. 

Selecting produce tagged data file checkbox 230 in output 
parameter section 218 causes newal clustering function 36 
to produce a data file containing all the original information 
in the data file together with an additional field containing 
the cluster identification for each record. Even when recall 
only checkbox 228 is not selected, neural clustering function 
36 performs a single recall run and at the end of the allotted 
number of training cycles to determine the cluster for each 
record. This corresponding data file may then be used in rule 
based segmentation function 34 to select particular clusters 
that have desirable features. 

Additionally, whenever a neural clustering run is 
performed, neural clustering function 36 may produce a 
textual summary file containing, for example, cluster versus 
parameter information. The file typically has a header fol- 
lowed by four main sections containing mean, standard 
deviation, and minimum and maximum data, respectively. 
Also each row in the file may contain a "used" flag indi- 
cating whether the parameter was used as an input to the 
clustering process, followed by the mean value (or the 
standard deviation, etc.), for each cluster. The file may be 
single comma delimited, and the numbers in the file may be 
output to six significant figures. The file should also pref- 
erably be formatted so that it can be easily read into other 
applications such as, for example, spreadsheet applications. 

Histogram output section 232 in clustering setup window 
214 provides a checkbox for the creation of histograms for 
each parameter in each cluster. This information is calcu- 
lated during the clustering run and stored in a separate file 
so that during analysis the distribution of parameters across 
the cluster map may be viewed and analyzed in a graphical 
manner. 

Parameter normaUzation section 220 in clustering setup 
window 214 preferably provides three parameter normal- 
ization options. Normalization rescales the parameters so 
that they all have the same dynamic range, e.g., minimum or 
maximum values. If the data is not normafized prior to 
processing, and some parameters have a large range of 
values, these values may dominate the processing and pro- 
duce erroneous results. 

One of the normalization options in section 220 is a "use 
mean and standard (std,) deviation (dev.)." When this 
parameter normalization is selected, normalization of a 
parameter involves subtracting the mean from the parameter 
and dividing it by the standard deviation for the parameter. 
Therefore, if the parameters are normally distributed, the 
variables will then become distributed with a mean of 0 and 
variance of 1. This normalization option is generally rec- 
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ommended if, for example, two parameters vary equally, but 
over different ranges. This option will then normalize the 
distributions so that the parameters vary over the same 
range. 

5 Also within parameter normalization section 220 is a 
"user defined offset and gain" normalization option. Select- 
ing this type of normalization allows for using prior infor- 
mation to weight a particular parameter more strongly or 
weakly when the emphasis of the parameter is known. 

10 Parameter normalization 220 also includes a "neither" nor- 
malization option. Selecting this option prevents normaliza- 
tion of the data. This option may be useful if one or two of 
the parameters are known to dominate the data compared 
with all others. In this case, attention is paid to the lesser 

15 parameters or possibly excludes them all together 

Unit normalized vectors checkbox 234 in clustering setup 
window 214 may be selected to normalize all data to lie in 
a unit sphere. This is similar to the "use mean and standard 
deviation" option in parameter nonmalization section 220 

20 except that the parameters are normalized all at once rather 
than on a per parameter basis. This option may be useful 
when it is suspected that a number of rogue data points exist 
within the data set. If some parameters have much greater 
variability than others, however, they will become the domi- 

25 nant factors in the clustering process. 

Parameter selection section 222 in an exemplary cluster- 
ing setup window 214 of FIG. 11 shows general information 
on the parameters that may be used in the clustering process. 
Section 222 includes parameters available field 222a that 

30 indicates the number of fields in the chosen data set and 
parameters selected field 222b that indicates the number of 
parameters selected for the current clustering run. Param- 
eters for clustering may be selected by choosing select 
parameters for clustering button 222c that provides access to 

35 a parameter selection window. 

FIG, 12 illustrates an exemplary parameter selection 
window 236 for selecting a parameter for use in neural 
clustering function 36 of data analysis system 10 of the 
present invention. Parameter selection window 236 prefer- 

40 ably includes data file information section 238, which 
includes a data file name field, a parameters available field 
for the selected data file, and a parameters selected field for 
specifying the number of parameters to be used in a clus- 
tering run. 

45 Parameter selection window 236 also preferably includes 
available parameters section 240, which includes a list of all 
available parameters in the specified data set. Selected 
parameters section 242 in parameter selection window 236 
includes the names of all the parameters that have been 

50 selected for the current neural clustering run. Using include 
button 244, remove button 246, include all button 248, and 
remove aU button 250, parameters may be moved between 
available parameters section 240 and selected parameters 
242 as desired for a given neural clustering run. 

55 Once the parameter selection is complete, parameter 
selection window 236 may be closed by selecting OK button 
251. Alternatively, a parameter selection process may be 
cancelled at any time by selecting cancel button 253 in 
window 236. 

60 Rcmrning to FIG, 11, training setup section 224 in clus- 
tering setup window 214 may be used to define the size of 
a cluster map as well as the number of training cycles for a 
cluster run. Section 224 accordingly includes map width 
input field 252 for specifying the number of clusters along 

65 one edge of a cluster map. The total number of clusters is 
preferably the square of the number. A default for the 
number of clusters may be set at, for example, four and a 
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maximum number of clusters may be limited to, for 
example, thirty. Number of training cycles input field 254 
may be used to input and display the number of complete 
runs through a data file by neural clustering function 36. 
Therefore, if the data consisted of, for example, 10,000 
items and training cycles input field 254 is ten, then 10 times 
10,000 equals 100,000 passes through the clustering net- 
work. A default number of training cycles may be set at, for 
example, 10 cycles. 

Training setup section 224 in clustering setup window 214 
in FIG. U also preferably includes advanced clustering 
configuration button 256. Selecting button 256 provides 
access to advanced clustering configuration section 258 in 
window 214. Section 258 includes initial update neighbor- 
hood size input field 260 that is the radius that should be set 
to approximately 30% to 40% of the total map dimensions, 
e.g., for an 8x8 cluster map input field 260 should be set to 
3 or 4. Values under 20% of the map size in initial update 
neighborhood size input field 260 could lead to the possi- 
bility of unused clusters within the map imless the map is 
further trained in a later session. A default value for field 260 
may be set to, for example, 1.9. 

Final update neighborhood size input field 262 of 
advanced clustering configuration section 258 should pref- 
erably be set between zero and one. If field 262 is set to zero, 
then every cluster has a possibility of becoming a clxister 
center. If set to one, there is a possibility of a lesser 
distinction between adjacent cluster. A default for field 262 
is, for example, 0.5. Weight update factor input fields 264 
include an initial weight update factor field and a final 
weight update input field. The weight update factors deter- 
mine how fast the network adapts to each new example. A 
large initial weight update factor is used to quickly establish 
the network cluster structure. The final weight update factor 
is used at the end of the training process to further define the 
clustering structure. A default initial weight update factor of, 
for example, 0,9 and a default final factor of, for example, 
0.1 may be suitable. 

Also within advanced clustering configuration section 
258 is randomize training data checkbox 266. Checkbox 266 
may be selected to help avoid the possibihty that some 
artificial clusters may form. Training times will increase, 
however, when checkbox 266 is selected. Advanced cluster 
configuration section 258 also preferably includes force 
activation update checkbox 268. Selecting checkbox 268 
allows the cells that may have been frozen out earlier in the 
clustering process, i.e., are empty, to take part in the clus- 
tering process again. So, for example, if due to a random 
effect most of the clusters are forming in an 8x3 region of 
an 8x8 map, by selecting force activation update checkbox 
268, neural clustering function 36 will allow the clusters in 
the 8x3 region to spread out again and make full use of the 
8x8 cluster map. A default value for checkbox 268 is, for 
example, not to force activation update. 

Once all of the setup information is input into neural 
clustering setup window 214, then OK button 270 may be 
selected to initiate a clustering run. Alternatively, a cluster- 
ing setup may be cancelled at any time by selecting cancel 
button 272. 

Once a neural clustering run has been made within neural 
clustering function 36 various analyses of the results is 
possible. Neural clustering function 36 allows for segment- 
ing the data set in terms of similarity to a set of user defined 
criteria. For example, in a customer marketing application, 
the data set may include information on customer socio- 
demographics, such as age, income, occupation, and lif- 
estyle interests. Neural clustering function 36 within data 
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analysis system 10 may be used to select a subset of these 
parameters and cluster on them to determine whether they 
fall into natural groupings that allow for more selective, 
personalized marketing. 

5 Continuing the customer marketing example, the present 
system's clustering analysis capability provides a mecha- 
nism for generating a better understanding of who the 
customers are and how they behave. Clustering function 36 
can be used to identify the most commonly occurring 

10 customer types and also the more unusual customers. It may 
also be used to identify the discriminating characteristics of 
individuals who buy particular products or services, or that 
behave in a particular way. Using neural clustering function 
36, previously unknown relationships can be uncovered in 

15 data and expected relationships may be verified quickly and 
easily. 

FIG. 13 illustrates an exemplary clustering analysis win- 
dow 274. Window 274 preferably includes toolbar 276, 
cluster map 278, parameter statistics information 280, and 

20 parameter graphs 282. Toolbar 276 includes initiate new 
input data configuration button 284, open new input data 
configuration button 286, save data input configuration 
button 288, and save data input configuration as button 290. 
Select results button 292 in toolbar 276 may be used to select 

25 a particular results file for further analysis in clustering 
analysis window 274. 

Occupancy {r]) button 294 in toolbar 276 may be selected 
to view the number of members in each cluster. Mean or 
average- value (x) button 296 may be selected to view the 

30 average value of a currently selected value for each cluster. 
Standard deviation (a) button 298 may be selected to view 
the standard deviation for a currently selected parameter for 
each cluster. Minimum value (x) button 300 and maximum 
value (x) button 302 may be used to view the minimum and 

35 maximum value of a currently selected parameter for each 
cluster, respectively. 

Also, preferably contained in toolbar 276 is current 
parameter selection field 304 that displays the name of the 
current parameter selection. Initiate histogram plotter button 

40 306 in toolbar 276 may be selected to initiate the plotting of 
a histogram while exit button 308 in toolbar 276 may be 
selected at any time to exit the clustering analysis portion of 
neural clustering function 36. 

In one embodiment of the present invention, cluster map 

45 278 is displayed as a multi-hued square grid. Alternatively, 
appropriate grey-scalings or shading, as shown in FIG. 13, 
may be employed for cluster map 278. Each cell in cluster 
map 278 may be color-coded representing minimum and 
maximum values for the selected parameter. Unoccupied 

50 cells may be displayed in gray, and cells of similar color 
aggregate to form a cluster. Cluster map 278 is a continuous 
surface and there are no edges to the map. This means that 
if a cluster forms on the bottom edge and there is a similar 
cluster directly above the top edge then these cells prefer- 

55 ably aggregate to form the same cluster. The user may 
interact with the cluster map via pointing device 18. Box 310 
denotes the current cluster cell selection. 

Directly adjacent to clustering map 278 in window 274 of 
FIG. 13 are summary statistics 312. Summary statistics 312 

60 include occupancy, mean value, standard deviation, and 
minimum and maximum values for the currently selected 
parameter in the selected cell as well as the currently 
selected parameter with respect to the whole cluster map. 
When a new cluster map is loaded for analysis via 

65 window 274, the default statistic for the map is occupancy, 
i.e., the number of members in each cell. By selecting 
occupancy button 294, standard deviation button 298, mini- 
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mum value button 300, or maximum value button 302 in 
toolbar 276, the appropriate stati^ics will display in sum- 
mary statistics 312. In the example shown in FIG. 13, the 
currently selected parameter is readership for the Daily 
Mirror as represented by parameter name "MIRROR" in 
parameter field 304. According to the example presented in 
FIG. 13, approximately 18% of the occupants in the cell 
selected with box 310, cluster number 10 as identified in the 
"Ouster" field of summary statistics 312, read the Daily 
Minor, compared with the average of around 8% for the 
whole map. 

Parameter statistics information 280 in clustering analysis 
window 274 preferably displays the mean, standard 
deviation, and minimum and maximum value for all avail- 
able parameters, together with a flag ("Y") under the input 
column indicating whether the parameter was xised during 
the clustering process. Using pointing device 18, a param- 
eter in parameter statistics information 280 may be selected 
and plotted in parameter graphs 282. 

Parameter graphs 282 display a graph of the currently 
selected parameters. Each parameter is plotted on a hori- 
zontal scale, normalized from zero to one. Downward facing 
triangles 314 denote the average value of the parameter for 
the selected cell, compared with the average value of the 
parameter for the population as a whole that is represented 
by upward facing triangles 316. 

Parameter graphs 282 are controlled with control buttons 
318. Buttons 318 include all button 320 that when selected 
causes all parameters for the selected cell to be displayed in 
parameter graphs 282. All inputs button 322 may be selected 
to display a graph of all cluster inputs for a selected cluster 
cell. Selecting none button 324 prevents parameter graphs 
282 from being displayed. Also, control buttons 318 include 
options button 326 that pulls up a graph options dialogue 
box. 

FIG. 14 illustrates exemplary graph options dialogue box 
328 that may be used to modify the graphs displayed in 
parameter graphs 282 in clustering analysis window 274 
when options button 326 is selected. Using dialogue box 328 
the graphs in cluster analysis window 274 may be modified 
as desired. By selecting none option 330, which may be the 
default option, the parameters in parameter graphs 282 will 
be presented in the order that the parameters are stored in the 
data set. Selecting cluster mean option 332 allows the 
parameters to be displayed in the order of maximum mean 
value. By selecting cluster and population mean difference 
option 334, the parameters within a cell that vary most 
significantly from the norm, i.e., the overall population, are 
displayed. The parameters will then be ranked and presented 
in terms of maximum positive deviation through to maxi- 
mum negative deviation. Also, selecting cluster and popu- 
lation mean absolute difference option 336 allows the 
parameters to be ranked in absolute terms, i.e., ranked in 
terms of absolute variation firom the population mean. 

Use labels checkbox 338 in graph options dialogue check- 
box 328 is typically checked as a defauk option. When not 
checked, parameter graphs 282 will be drawn without the 
spread range bars, i.e., with just triangles 314 and 316. Also 
when checkbox 338 is not checked, parameter graphs 282 
will be scaled according to the range of the largest parameter 
in the set. Once the options in dialogue box 328 are 
specified, box 328 may be closed by selecting OK button 
337. Changing the graph options may be canceled at any 
time by selecting cancel button 325. 

Returning to FIG. 13, a histogram for any cell or group of 
cells may be initiated by selecting histogram button 306 in 
toolbar 276 in clustering analysis window 274. Selecting 



histogram button 306 presents parameter distribution win- 
dow 340, and an example of which is shown in FIG. 15. 
Window 340 allows for exploring the distribution of indi- 
vidual parameters across one or more cluster cells. 

5 Parameter distribution window 340 of FIG. 15 preferably 
includes upper histogram 342 and lower histogram 344. 
Window 340 also includes parameter list section 345 that 
may be used to select the parameter for plotting in the 
histograms. Histogram information section 346 includes 

10 cluster number and occupancy field 346a, occupancy field 
345f), mean field 346c, standard deviation field Mid, and 
minimum value 347e and maximum value 348/ fields for 
both histograms and the whole data set. 

In the example shown in FIG. 15, the distribution of the 

15 parameter AGE selected in parameter list section 345 is in 
top histogram 342 for cluster cell number "2" and lower 
histogram 344 for cluster cell number "3". Summary statis- 
tics 346, e.g., occupancy, mean, standard deviation, and 
minimum and maximum values, are also provided for upper 

20 342 and lower 344 histograms compared to the data set as a 
whole. Histograms 342 and 344 in window 340 may be 
displayed as actual values, which is the case in the example 
of FIG. 15, Alternatively, histograms 342 and 344 can be 
displayed as percentage values by making appropriate selec- 

25 tions in values sections 347 of window 340. 

Copy Hist 1 button 348 and Copy Hist 2 button 350 copies 
the associated histogram to a clipboard in order that the 
histograms can be imported as bit maps to an appropriate 
application, for example, a word processing application. 

30 Once copied to the clipboard, the histograms may then be 
pasted into a word processing document. 

Neural prediction function 38 of data analysis system 10 
of the present invention provides predictive modeling capa- 
bility. This capability may be particulariy beneficial in a 

35 customer analysis setting in predicting fumre behavior of 
current or prospective customers by learning firom actual 
customer behavior. Neural prediction fiinction 38 utilizes 
supervised learning neural network technology having the 
capability to leara from historical behavior stored in 

40 database(s) 44. This technique may be used to predict any 
aspect of behavior for which records of historical behavior 
are stored in database(s) 44. For customer databases, this 
behavior may include product preference, customer 
profitability, credit risk, and likelihood of firaud. In imple- 

45 menting a direct marketing campaign, for example, neural 
prediction function 38 may be used to analyze records of 
individuals who did and did not respond to marketing 
campaigns. Function 38 may be used to score prospect lists 
to identify those individuals most likely to respond to a 

50 future marketing campaign. 

Neural computing is an advanced statistical data process- 
ing technique. Unlike conventional techniques that require 
programming with complex rules and algorithms, neural 
networks develop their own solutions to problems by leam- 

55 ing from examples taken from the real world. For suitable 
applications, the technique can provide exceptional benefits 
in the ability to rapidly develop effective, computationally 
eflficient solutions to complex data processing problems. 
One embodiment of neural prediction function 38 of data 

60 analysis system 10 of the present invention uses a type of 
supervised learning neiu-al network known as "multi-layer 
perception" (MLP) network. MLP comprises a large nxunber 
of simple interconnecting processing elements (neurons) 
arranged in three layers. Each neuron within the architecture 

65 combines the weighted outputs from the neurons in the 
previous layers, passes this through a non-linear transfer 
function, and feeds the results on to the neurons in the next 
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layer. The input neurons take in the input information, while accuracy of the network and also provide information on the 

the output neurons provide the prediction. In the customer correlation between record profile and prediction accuracy, 

data set example, the input neurons receive customer infor- At step 376 in neural prediction function 38 in FIG. 17 an 

mation stored in a database and the output neurons produce iterative refinement of the neural prediction process may 

the customer behavior prediction. 5 take place. As previously stated, this may involve using rule 

FIG. 16 A illustrates MLP neural prediction network 352. based segmentation function 34 to compare the predictive 
Network 352 comprises three levels of neurons 354, includ- performance between the training data set and the test data 
ing first level 356, second level 358, and third level 360. set. Significantly better predictive performance on the train- 
Each neuron in a given level connects to every neuron in the ing set may be indicative of over- learning and poor model 
level below and above it, where appropriate, by intercon- ao generalization in the network. The objective of training steps 
nections 362. 370 and 372 is to produce a model that accurately reflects the 

FIG. 16B represents the functionality of each neuron 354 * complex interrelationships between the input and output 

in neural network 352. Each neuron combines the weighted parameters yet is sufficiently simple, to be generic. The 

outputs from the neuron in the previous layer, passes it trade-off between accuracy and generalism is controlled via 

through a non-linear transfer function, and feeds the results 15 the number of input parameters chosen and their encoding 

to the next layer of neurons. schemes. As with any statistical modeling system, the model 

Because neural network 352 represented in FIGS. 16A predictions mxist be carefully examined as well as the 

and 16B uses a non-linear processing element as the fun- sensitivity of the predictions to each input parameter. The 

damental building block of system neuron 354 it is capable model setup may be interactively refined to reproduce the 

of modeling complex non-linear relationships. Also, the 20 required performance. 

weights on interconnections 362 (not explicitly shown) Once step 376 is complete, neural prediction function 38 

determine the nature of the predictions made by neural may be exited. The steps of neural prediction function 38 

network 352. These weights are defined during a "training" have now generated a predictive network that may be used 

process with the system of weights effectively representing to predict expected behavior firom a data set. 

"knowledge" derived from the training data. 25 FIG. 18 illustrates an exemplary neural network file 

FIG. 17 illustrates an exemplary flowchart for neural specification window 378, which in one embodiment of the 

prediction function 38 of data analysis system 10 of the present invention is the main window for neural prediction 

present invention. Neural prediction function 38 begins at function 38. Window 378 preferably includes toolbar 380. 

step 364 whenever one of prediction buttons 50 in main Toolbar 380 includes initiate new network configuration 

toolbar 45 is selected (see FIG. 2). At step 366 in neural 30 button 382, retrieve network configuration button 384, save 

prediction function 38 a predictive model setup is defined. data input configuration button 386, and save data input 

Step 366 essentially involves defining the parameters are to configuration as button 388. Toolbar 380 also includes run 

be predicted. In the customer database example, the param- button 390 for creating a file specification and stop button 

eters to be predicted may include, for example, mail 392 for stopping a neural network file specification process, 

responsiveness, credit risk, profitability, etc. Also at step 366 35 Exit button 394 in toolbar 380 closes neural prediction 

the parameters that the predictions are to be based on are function 38. 

specified, e.g., age, income, etc. Additionally, at step 366 the Neural network file specification window 378 also pref- 

data may be divided into two groups: a training set for use erably includes input data files section 396. Section 396 

in developing the models and an independent test set for use allows for browsing and selecting firom available data files 

in testing the predictive capabibty. 40 in a given directory a data file for use with neural prediction 

At initiahze predictive network step 368 in neural pre- function 38. Raw binary files may be selected via select files 

diction function 38, a random set of network weights for button 398, which puUs up an appropriate input file selection 

interconnections 362 is generated. Next, at step 370 the dialogue box. 

training process is started. In start training process step 370 FIG. 19 shows an exemplary input file selection dialogue 
neural prediction function 38 takes the first record and enters 45 box 400 for selecting a file for processing with neural 
the appropriate information into the neural network's input prediction function 38 and is exemplary of a dialogue box 
neurons (neuron level 356 in FIG. 16 A). Because these available when select files button 398 in window 378 is 
initial weights are randomly chosen, the network's initial selected. Dialogue box 400 preferably includes available 
output prediction is random. This initial prediction is com- files section 402 and selected files section 404. Using 
pared to known historical behavior for that record and a 50 include button 406, remove button 408, and replace button 
training algorithm is used to alter the weights on intercon- 410, files may be moved from available files section 402 to 
nections 362 so that the next time a similar record is selected files section 404, and vice versa. Once the appro- 
presented to the network its prediction will be closer to priate files are selected for processing, done button 412 
known historical behavior. closes dialogue box 400. Alternatively, the file selection 

At step 372 the training process is completed. This 55 process may be canceled at any time by selecting cancel 

involves repeating start training process step 370 a number button 414. Access to additional directories for selecting 

of times for all records in the training set. As neural files may be obtained by selecting directory button 416. 

prediction function 38 goes through the training process, the In one embodiment of the present invention, using input 

prediction will gradually get closer and closer to actual file selection dialogue box 400 up to five input files may be 

values. 60 selected for processing. These files may be of the same 

At step 374 the results of training process steps 370 and length and have disjoint parameters. It is preferred, however, 

372 may be tested. The neural network's predictive capa- that a single file be created from the several files using the 

bility is tested on a test sample data set. The goal at step 376 data merge capability of the present invention described in 

is to see how well the system predicts the known behavior discussions relating to FIGS. 29 and 30. 

without prior knowledge of that behavior. One way to test 65 Returning to FIG. 18, neural network file specification 

this prediction is to feed the prediction into rule based window 378 also preferably includes parameter selection 

segmentation function 34 to analyze the overaU prediction region 418. In one embodiment of neuiral prediction function 
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38 of the present invention, the maximum number of input maximum input value is associated with an activation of 
parameters to a neural network is forty. It is noted that the one. The parameter value is then encoded by assigning an 
maximum number of input parameters may be varied with- activation linearly between these limits. Otherwise, the first 
out departing from the spirit and scope of the present neuron is assigned a spot value corresponding to the mini- 
invention. Selecting the parameters for a model is accom- 5 mum value and the last neuron the maximum value. The 
plished by choosing select parameters button 420 that pro- intermediate neurons are assigned spot values linearly from 
vides access to a window for specifying parameters. the minimum to the maximum values. The activation values 
FIG. 20 illustrates exemplary specify parameters window for these neurons are then assigned so that the dot product 
422 for use in selecting the input and output parameters for between the vector of activation values and the vector of 
a neural network and is provided when select parameters lO spot values equals the parameter value. Furthermore, the 
button 420 in window 378 is selected. Window 422 prefer- sum of the activation values is equal to one and no more than 
ably includes available parameters section 424, which pro- two neurons are activated. If a parameter value lies outside 
vides a list of the parameters available for a data file. the encoding range then it is encoded as if it was the 
Window 422 also includes input parameters section 426 for minimum value if it is less than the minimum value or 
selecting the input parameters for a neural prediction net- 15 encoded as the maximum value if it is greater than the 
work. Input parameters section 426 preferably includes maximum value. A typical example of a parameter that may 
parameter name section 428 showing the names of the input be spread encoded is a person's age, where a distribution is 
parameters and the scheme for the encoding of each required for thresholding purposes. 

parameter, which will be described in discussions relating to Clock spread encoding is similar to spread encoding 

FIG. 21. Input parameters section 426 also has include 20 except that the first neuron is assigned two spot values (the 

button 430 for adding a selected parameter in available minimum and maximum), and the last neuron is viewed as 

parameters section 424 to parameter name section 428 in being adjacent to the first neuron in a circular fashion. This 

input parameters section 426. Remove button 432 removes method is useful for encoding times, angles, etc., because it 

a selected parameter from parameter name section 428. gives a smooth transition and activation values when the 

Encode button 434 in input parameters section 426 may be 25 parameter of value goes full circle, 

selected to provide the encoding scheme for an input param- In one-in-N encoding, N neurons are defined. Only one 

eter. neuron is given an activation value of one with the rest of the 

For each network input parameter, an encoding scheme neurons receiving an activation value of zero. If a parameter 

must be specified. In one embodiment of neural prediction value is the minimum value, then the first neuron is 

function 38 of data analysis system 10, three different 30 activated, if it is minimum value plus one, then the second 

encoding schemes are supported and will be described in is activated, etc. If the input parameter value is less than the 

discussions relating to FIG. 21. Also, the minimum and minimum value then the first neuron is activated. If the 

maximum values for each input should be specified along parameter value is greater than the maximum value then the 

with the number of neurons over which the input value is to last neuron is activated. The number of neurons must be the 

be encoded. The minimum and maximum values are typi- 35 maximum parameter value minus the minimum parameter 

cally the minimum and maximum values of the input value plus one, 

parameter. Exceptions may be necessary when the parameter Additionally, in one-in-N encoding, each neuron corre- 

has a long-tailed distribution in which case some other value sponds to a class. To encode a parameter value, a class for 

may be selected, e.g., +/-4 standard deviation s. Values of each parameter is determined and the corresponding neuron 

the input parameter greater than the maximum specified 40 is given an activation of one. All other neurons are given an 

value are clipped to the maximum value, and parameter activation of zero. Examples of parameters that may be 

values less than the minimum specified value are set to the encoded using one-in-N encoding include marital status 

minimum value, (three neurons — married, single, and divorced), gender (two 

FIG. 21 shows an exemplary encode input parameter neurons — male and female), and income (multiple neurons 

dialogue box 436 that may be accessed when encode button 45 corresponding to income ranges). 

434 in input parameters section 426 of specify parameters Returning to FIG. 20, specify parameters window 422 

window 422 is selected. Dialogue box 436 may be used to also preferably includes output parameters section 450, 

encode the input parameters and specify the minimum and which identifies the parameters to be predicted based on the 

maximum input parameter values and the number of neurons data set. Similar to input parameters section 426, output 

on which the input parameter is to be input. so parameters section 450 includes parameter name section 

As previously stated, one embodiment of the present 452, include button 454, remove button 456, and encode 

invention supports three types of input encoding schemes button 458. The output parameters for a neural network may 

including spread, clock spread, and one-in-N encoding. be selected and encoded as previously described for the 

Accordingly, dialogue box 436 preferably includes encoding input parameters as described above in discussions relating 

type section 438 for selecting the appropriate encoding 55 to input parameters section 426, 

scheme for a parameter. Dialogue box 436 also includes By selecting encode button 458 in output parameters 

minimum value input 440, maximum value input 442, and section 450, a dialogue box similar to encode input param- 

number of neurons input 444 for specifying these values for eter dialogue box 436 in FIG. 21 is provided. Output 

a parameter. Once the information in dialogue box 436 is parameters, however, are preferably encoded in one of two 

complete, the selections in dialogue box 436 are accepted by 60 schemes: spread and one-in-N encoding schemes, 

selecting OK button 446. Changes to an input parameter via Additionally, a single output parameter is preferably sped- 

dialogue box 436 may be terminated at any time while in fied in output parameter section 450 of parameter name 

dialogue box 436 by selecting cancel button 448. section 452. 

W^th spread encoding, the parameter of interest is spread In the outputs of a neural prediction network the dot 

across the nem"ons. If the number of encoding neurons in 65 product between the vector of activation values and the 

number of neurons input field 444 is set to one, the minimum vector of spot values divided by the maximtmi activation 

input value is associated with an activation of zero, and the value equals the output parameter value, i.e., the predicted 
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value from the network. For spread output encoding with 
only two neurons, the output values are taken to be the dot 
product of the activations with the spot values, divided by 
the sum of the activations. Where there are more than two 
output neurons, the sum of the activations on adjacent pairs 5 
of neurons should be determined. The sums of these neurons 
are examined and the highest one is selected. The activations 
and spot values for this pair of neurons may be used in 
decoding a pair of nem"ons. For one-in-N output encoding, 
the neuron having the highest activation is noted, and the lO 
output from the network is then the class that corresponds to 
the highest activated neuron. 

Continuing with FIG. 20, once the input parameters and 
output parameters are selected, specify parameters window 
422 may be closed via OK button 460. Alternatively, a 15 
parameter selection may be canceled at any time with cancel 
button 462. 

Returning to neural network file specification window 378 
shown in FIG. 18, window 378 also preferably includes 
generation status messages region 464. Region 464 provides 20 
processing status information once a neural run is initiated 
via run button 390. Typically, messages are generated for 
every 1,000 records processed, followed by a finish process- 
ing message at record N message. Also, once the neural 
processing is complete, generation status messages 464 will 25 
provide other information, such as the name of the output 
file, the encoding data file, the header data file, and that the 
configiu^ation has been saved in the output file. A neural run 
may be suspended at any time via stop button 392. 

Once a predictive network configuration has been speci- 30 
fied as described above, neural prediction function 38 may 
be run on a data set. With neural prediction function 38 of 
system 10, one or more networks that have been previously 
specified may be run on a data set. In one embodiment of the 
present invention, up to thirty different runs, in which the 35 
network is run in either training(leaming) mode or predict 
(recall) mode is possible. 

FIG. 22 illtistrates an exemplary run neural network 
window 466 for mnning a network configuration on a data 
set. Run neural network window 466 preferably includes 40 
toolbar 468 having initiate new input configuration button 
470, open new data configuration button 472, save data input 
configuration button 474, and save data input configuration 
as button 476. Run button 478 in toolbar 468 initiates a run 
with a neural network. View analysis graph button 480 may 45 
be selected to display a graph that gives an indication of the 
status on a neural network's training. Exit button 482 in 
toolbar 468 may be selected to close neural prediction 
function 38. 

Run neural network window 466 includes data and con- 50 
figtiration section 484 that displays a list of currently speci- 
fied network runs. In the example shown in FIG. 22, two 
batch runs have been specified. In the first run (Run 01), 
parameter data is retrieved from a training file, using records 
from one up to 18,000 with ten iterative training cycles 55 
through this data. In the second run (Run 02), neural 
prediction function 38 is set for predictive, recall mode using 
records 18,001 through 20,000. 

Run neural network window 466 of FIG. 22 also includes 
weights and results files section 486. Section 486 displays 60 
the selections made for these attributes. In the example 
shown in FIG. 22, Run 01 is using a random weights file for 
input purposes, but is storing the trained weights in file 
FEG.BWT; the results themselves are not being stored to a 
file. In Run 02 weights are input via the BATCH facility, but 65 
trained weights are not being stored. The corresponding 
results in Run 02 stored in file FEG.BRS. 
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Network configurations section 488 is also provided 
within run neural netwoiic window 466. Section 488 dis- 
plays the options selected and defined for training or 
prediction, respectively. Network configurations 488 is 
given a unique identifier, e.g., predict or train, together with 
a textual description. Adding or editing these definitions is 
possible by selecting add button 490 or edit button 492, 
respectively, which provide an appropriate edit network 
configuration window. 

FIG. 23 illustrates an exemplary edit network configura- 
tion window 494 that in one embodiment of the present 
invention may be used to edit a network configiuation. 
Window 494 preferably includes network configuration 
identification (ID) field 496 that shows the name of a 
particular network configuration ID. Description field 498 in 
window 494 provides a description of the network configu- 
ration ID in field 496. 

Edit network configuration window 494 of FIG. 23 also 
preferably includes network parameters section 500. Section 
500 includes number of middle neurons field 502, learning 
rate field 504, and momentum field 506. The number of 
middle neurons is typically set to a value equal to approxi- 
mately 25% of the number of input nem"ons. The learning 
rate parameter typically ranges between 0 and 1 and deter- 
mines the speed of the convergence of the training process. 
A typical learning rate is 0.3 as shown in FIG. 23, although 
some experimentation may be required. Small learning rates 
can lead to excessive training times, while high learning 
rates can result in a network failing to converge. The 
momentmn parameter allows the network to avoid distor- 
tions in the training process and enables, the network to 
evade local minima. The momentum parameter can also be 
thought of as a smoothing factor applied to the error rate/ 
correction process. Care should be taken, however, to avoid 
overshooting the global minimum error, i.e., the optimum 
solution. Atypical value for the momentmn parameter is 0.2. 

Mode section 507 in edit network configuration window 
494 must be used to specify the mode of the run in either 
train or forecast mode. Also, user information section 508 is 
used to set whether a display and the display's interval are 
to be provided. When displayed, the frequency of display 
may be set in display interval field 510. 

In one embodiment of neural prediction function 38 of 
data analysis system 10, three training completion criteria 
can be used. For the first criteria the maximum number of 
training cycles is specified. One cycle through a data set is 
one complete pass through the training set during training. 
The second type of completion criteria involves setting an 
error goal. When the entire training set reaches the set error 
goal, the training is complete. The third completion criteria 
slops training when the error begins to increase over an 
evaluation data set. This particular method avoids the net- 
work overfitting the training set, i.e., it avoids the problem 
of over-generalization. 

Returning to FIG. 22, batch run set-up section 512 is also 
preferably provided in window 466. Batch run set-up section 
512 includes add button 514, edit button 516, and remove 
button 518. By selecting add button 514 or edit button 516 
a window for editing a run may be provided. 

FIG. 24 illustrates an exemplary edit run window 520 for 
editing a particular run. Window 520 preferably includes 
training data file section 522, forecast data file section 524, 
input weights section 526, output weights section 528, and 
results file section 530. Window 520 and its attendant 
sections allows for browsing and selecting a training data 
file via training data file section 522. Window 520 also 
provides for browsing and selecting a forecast data file via 
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forecast data file section 524. Input weights section 526 
allows for selecting either random or batch weights. When 
batch weights are specified, the weights* file corresponding 
to the previous run will be used. Therefore, the batch option 
in section 526 is not available for the first run in a sequence. 
Output weights section 528 provides for browsing and 
selecting an output weights file. Results file section 530 also 
provides for browsing and selecting a results file. 

Edit run window 520 can also be used to specify the start 
and end record for a run in section 532. Section 534 in 
window 520 may be used to specify the start and end 
forecast record for a run. For example, if a file contained 
200,000 records, training on 190,000 records with testing on 
10,000 records could be specified with fields 532 and 534. 
Also the number of training cycles may be specified in 
training cycle field 536. The neural network's configuration 
may be set in network configuration section 538 to give it a 
textual identifier, e.g., train, test, or predict. Once the fields 
in edit run window 520 are appropriately modified, window 
520 may be closed via done button 540. Cancel button 542 
may be used at any time to cancel inputs to window 520. 

Returning to FIG. 22, run neural network window 466 
also includes output options section 544. In one embodiment 
of neural prediction ftmction 38 of data analysts system 10 
of the present invention, three output options as shown in 
output options field 544 in FIG. 22 are preferably provided. 
These output options include a text option, a graphic option, 
and an analysis graphs option. 

FIG. 25 illustrates an exemplary text neural network 
results window 545 corresponding to the text option in 
output options 544 in window 466 of FIG. 22. Window 545 
preferably includes status information section 546 that 
includes current mn field 548 providing status information 
on the current run. Number of training cycles in run field 550 
specifies the number of training runs for a particular run, and 
current cycle field 552 specifies the cycle of the run at its 
current state. 

Neural network results window 545 also preferably 
includes go button 554 for initiating a particular session. 
Done button 556 indicates once a particular run is complete. 
By selecting about button 558 information on the current run 
may be viewed, and by selecting pause button 560 the 
current run will be suspended. Results information section 
562 in window 545 presents textual information on the 
results of the current neural prediction. 

FIG. 26 illustrates an exemplary graphic neural network 
results window 564 corresponding to the graphic option in 
output options 544 in window 466 of FIG. 22. Window 564 
preferably includes status information section 546 as previ- 
ously described. Window 564 also includes graph section 
566 providing a graphical representation of the status of a 
particular neural prediction run. The example in FIG. 26 
shows a graphical representation of the actual value of the 
variable being predicted, compared with the predicted value, 
for 48 examples in a training set. As training proceeds, the 
predicted value and the actual value should converge. 

When the analysis graphs option is selected in output 
options 544 in window 466 of FIG. 22, neural prediction 
function 38 of data analysis system 10 records the errors 
generated throughout the training run and saves them to a 
temporary file. Once the training is concluded, a graph of 
these errors may be viewed by selecting graph button 480 in 
toolbar 468 as shown in FIG, 22. This produces an appro- 
priate dialogue box for selecting the criteria for plotting the 
errors. 

FIG. 27 illustrates an exemplary graph definition dialogue 
box 570 that may be produced by neural prediction function 
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38 when the analysis graphs option in output options 544 in 
window 466 of FIG. 22 is selected. Dialogue box 570 
preferably includes plot type section 572 for specifying 
whether the actual enor or absolute error should be used in 

5 graphing the errors and also whether icons are to be used in 
plotting the analysis graphs. Also provided within dialogue 
box 570 is batch information section 574 that lists the 
number of batches currently in consideration. Once the 
information within dialogue box 570 is acceptable, OK 

10 button 576 may be selected. Alternatively, this operation 
may be canceled by selecting cancel button 578. 

FIG. 28 illustrates an exemplary post network graphical 
analysis window 568 having toolbar 569. Toolbar 569 
preferably includes a number of buttons for accessing graph- 

15 ing options and exit button 598. Window 568 shows exem- 
plary absolute error plot 580 generated in accordance with 
neural prediction function 38 of data analysis system 10. The 
example of FIG. 28 displays a plot of absolute error over 19 
training intervals (cycles), and from FIG. 28 it can be seen 

20 that the error has reduced progressively from an average of 
above 0.738 down to 0.08 over this training interval. 

Data management function 42 of data analysis system 10 
of the present invention may be used to perform several 
operations on data produced by data processor 32. Typical 

25 operations available with data management function 42 
include data merge, data append, and data pairing opera- 
tions. 

FIG, 29 illustrates data merge function 582 in accordance 
with data management function 42 that allows for combin- 

30 ing two different files into a single file. Data merge function 
582 allows file 584 of length n records and containing x 
parameters to be merged with file 586, also of length n 
records but with y parameters, into new file 588 with n 
records containing x+y parameters. To facilitate the merge 

35 operation, input files 584 and 586 preferably do not share 
any parameter names in common with each other. 

FIG. 30 illustrates an exemplary data merge window 590 
for performing data merge fiinction 582 depicted in FIG. 29. 
Data merge window 590 preferably includes toolbar 592 

40 having run button 594, stop button 596, and exit button 598. 
Data merge window 590 also preferably includes input files 
section 600 for specifying the files to be merged in first file 
field 600a and second file field 600/?. Each file field has 
browse button 602 for providing a file list for selecting files 

45 for merging and parameter information 603 for providing 
information on the parameters in each file. 

Data merge window 590 also preferably includes output 
file section 604 for specifying the file destination and name 
for the merge of the input files specified in input files section 

50 600. Output file section 604 similarly includes browse 
button 606 for selecting the destination and name for the 
merged data files. Data merge window 590 also preferably 
includes conversion status region 608 that provides a texmal 
description of the status of a data merge. 

55 FIG. 31 illustrates data append operation 610 for com- 
bining two files of the same type into a single file that may 
be part of data management fimction 42 of data analysis 
system 10. The two files, e.g., files 612 and 614 in FIG. 31, 
contain identical parameter lists in each file. The parameters 

60 in the input files should also be in the same order, with the 
files also having the same length. Appending file 612 with 
file 614 results in output file 616. 

FIG. 32 illustrates an exemplary data append window 618 
for performing data append operation 610 as shown in FIG. 

65 31. Data append window 618 preferably includes toolbar 
592 as previously described. Data append window 618 
preferably includes input files section 620 that includes 
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directory button 622. By selecting directory button 622 and may be selected to include all of the records and their 
selecting a directory in directory field 624, a file listing in file parameters in the input file for conversion to the output file, 
listing section 626 is provided. Using include button 628 and Data output window 672 also preferably includes record 
remove button 630 the files to be appended may be specified selection region 690 that allows for specifying the start 
and moved to selected files section 632. Once a file is 5 record and finish record for the input file for conversion to 
selected in file listing section 626, file format section 634 output file. Also, window 672 preferably includes con- 
provides the parameter names within the file in parameter version status region 692 that provides a textual description 
name section 636 and the number of parameters in the file status of a data output operation once run button 594 
in number of parameters field 638. ^ selected . , , ^ . 
Window 618 also preferably includes output file section lo ^ Although the present mvenUon has been described m 

640 having browse button 642 from which a listing of file r h u ''f^'^'^''^'^^' r^"°"^, ^*^^.°f ^; 

®. . . Il^ substitutions, and alterations can be made hereto without 

names may be selected for the output file, e.g., output file i ^- ^ '„ • *u • 

• mr^ ^i ^ * * ci c i-i ^aa i . . ci deparUng from the spuit and scope of the invention as 

616 in FIG. 31. Output file field 644 displays the output file ^^^^^ g appended claims, 

name. Data append window 618 also preferably includes What is claimed is* 

conversion status section 646 that provides status informa- 15 1. a system for analyzing a data file containing a plurality 

tion on a particular data append operation once run button of data records, each data record containing a plurality of 

594 is selected. parameters, the system comprising: 

Data management fiinction 42 also preferably includes a an input for receiving the data file; and 

data paring capabUity that allows for paring down a file to ^ data processor comprising a clustering function for 

contain only those parameters of mterest. For example, a 20 clustering the data records into a plurality of clusters 

tagged file fi-om neural clustering function 36 could be pared containing data records having simUar parameters 

down to contain only the cluster identifications and other wherein the clustering function is further operable to 

parameters required for further analysis. Another possible generate a cluster map including a graphical depiction 

use of the data paring capabihty is for rationing stored data of the clusters, wherein the cluster map comprises a 

because of disk storage limitations. is plurality of graphical elements each having a graphical 

FIG. 33 illustrates an exemplary data paring window 648. depiction indicative of a number of records in a cluster. 

Window 648 preferably includes toolbar 592 previously 2. The system of claim 1 wherein the input is further 

described. Window 648 includes files section 650 having operable to convert the data records into a processing format 

input file field 652 and output file field 654 for specifying the ^r the data processor. 

name of the input file to be pared and the name the pared file 30 ^- system of claim 2 further comprising an output 

is to be stored under. Both input 652 and output 654 file operable to convert the data records in the processing format 

fields have a browse capabihty that may be activated using back to their original format. 

browse buttons 656. Window 648 also preferably includes ^ ^y^^.^^^.^i^^T "^^^'T "^^'^ ^^<=OTdsm the 

parameter selection region 658 including not selected "^/'^ ^ processed m binary 

parameter fist 660 and selected parameter list 662. Using 35 ^ VI T 7 P^^^^^' ... 

1 ^ u iznA • 1 J 11 u ** £££ I- « 5. The system of claim 1 further comprising a data 

mclude button 664, mclude all button 666, remove button ^ for manipulating the data file. 

668, and remove ail button 670, the parameters may be 6. The system of claim 5 wherein the data manager fiirther 

moved between hsts 660 and 662 to speafy a desired data comprises a data append function for appending data files, 

parmg. 7 -j^^ system of claim 5 wherein the data manager further 

Data parmg wmdow 648 also preferably includes conver- 40 comprises a data merge function for merging data files, 

sion status section 672 that provides a textual description of 8. The system of claim 5 wherein the data manager further 

the status of a particular paring operation once run button comprises a data paring function for paring parameters from 

594 is selected. a data file. 

FIG. 34 shows an exemplary data output window 672 that 9. The system of claim 1 wherein the cluster map is color 

provides access to a data output function within data acqui- 45 coded lo depict the relative number of records in each 

sition and output function 40 of data analysis 10 of the cluster. 

present invention. The data output function of data analysis 10. The system of claim 1 wherein the clustering function 

system 10 allows for converting files processed by system is further operable to provide statistics for each parameter 

10 back to a format that can be used by other programs. In for the records in a cluster. 

the embodiment of system 10 where data is processed as a 50 11. The system of claim 1 wherein the clustering function 

binary file, the data output function of data acquisition and is further operable to provide a parameter graph for each 

output function 40 may convert a binary file into, for parameter in the records in a cluster, 

example, an ASCII text format file. 12. The system of claim 1 wherein the clustering function 

Window 672 preferably includes toolbar 592 previously further comprises a neural clustering function, 

described. Wndow 672 preferably includes information 55 13. The system of claim 1 wherein the data processor 

section 674 having input file field 676, header file field 678, further comprises a prediction function for predicting 

and output file field 680. A listing of file names for each of expected future results from the parameters in the data 

these fields may be accessed by selecting one of browse records. 

buttons 682. Output delimiter field 684 specifies the type of 14. The system of claim 13 wherein the prediction func- 

delimiter to be used in the output file. In the example shown 60 tion further comprises a neural prediction function, 

in FIG. 34, space, tab, and comma delimiters are available 15. The system of claim 1 wherein the data processor 

deliniinators for the output file. further comprises a segmentation function for segmenting 

Window 672 also preferably includes file information the data records into a plurahty of segments based on the 

section 686 providing additional information on the input parameters. 

file including the number of records, the number of 65 16. The system of claim 15 wherein the segmentation 

parameters, and the names of the parameters within the input function is further operable to provide statistics on the data 

file. Select all button 688 within file information section 686 records. 
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17. The system of claim 15 wherein the segmentation 34. The system of claim 32 wherein the data manager 
function is further operable to segment the data records into further comprises a data merge function for merging data 
a plurality of segments using segmentation logic. files. 

18. The system of claim 15 wherein the segmentation 35. The system of claim 32 wherein the data manager 
function is further operable to segment an existing segment 5 further comprises a data paring function for paring param- 
into additional segments. eters from a data file. 

19. The system of claim 1 wherein the clustering function 36. The system of claim 31 wherein the segmentation 
is further operable to identify characteristic profiles for each function is further operable to provide statistics on the data 
group. records. 

20. The system of claim 13 wherein the prediction fimc- 37. The system of claim 31 wherein the segmentation 
tion employs a multi- layer perception network in predicting function is further operable to segment the data records into 
the expected future results. a plurality of segments using segmentation logic. 

21. The system of claim 1 wherein the data records further 38. The system of claim 31 wherein the segmentation 
comprise customer data records containing a plurality of function is further operable to segment an existing segment 
customer parameters in each customer record. into additional segments. 

22. The system of claim 15 wherein the data records 39. The system of claim 31 wherein the clustering func- 
further comprise customer data records containing a plural- tion is further operable to identify characteristic profiles for 
ity of customer parameters in each customer record and each customer group. 

wherein the segmentation function is further operable to 40. The system of claim 31 wherein the prediction func- 

segment the customer data records into logical groups of tion is ftirther operable to predict prospective customer 

customers. 20 behavior from current customer data records. 

23. The system of claim 21 wherein the clustering func- 41. The system of claim 31 wherein the segmentation 
tion is further operable to cluster customer data records into function is further operable to identify statistical distribu- 
statistically significant groups of customers. tions for each segment. 

24. The system of claim 13 wherein the data records 42. The system of claim 31 wherein the segmentation 
further comprise customer data records containing a plural- 25 function is further operable to generate a histogram for each 
ity of customer parameters in each customer record and parameter in the data records. 

wherein the prediction function is further operable to predict 43. The system of claim 31 wherein the segmentation 

customer behavior from the customer data records. function is ftirther operable to generate a histogram for a 

25. The system of claim 13 wherein the data records segment. 

further comprise customer data records containing a plural- 30 44. The system of claim 31 wherein the clustering func- 

ity of customer parameters in each customer record and tion is operable to generate a histogram for each cluster, 

wherein the prediction function is further operable to predict 45. The system of claim 31 wherein the clustering fimc- 

customer behavior from current customer data records. tion is further operable to generate a cluster map depicting 

26. Th& system of claim 15 wherein the segmentation the number of records in each cluster. 

function is further operable to identify statistics for each 35 46. The system of claim 45 wherein the cluster map is 

segment. color coded to depict the relative number of records in each 

27. The system of claim 15 wherein the segmentation cluster. 

function is further operable to identify statistical distribu- 47. The system of claim 31 wherein the clustering func- 
tions for each segment. tion is further operable to provide statistics for each param- 

28. The system of claim 15 wherein the segmentation 40 eter for the records in a cluster. 

function is further operable to generate a histogram for each 48. The system of claim 31 wherein the clustering func- 

parameter in the data records. tion is further operable to provide a parameter graphs for 

29. The system of claim 15 wherein the segmentation each parameter in the records in a cluster. 

function is further operable to generate a histogram for a 49. A method for analyzing a data file containing a 

data segment. 45 plurality of data records, each data record containing a 

30. The system of claim 1 wherein the clustering function plurality of parameters, the method comprises the steps of: 
is further operable to generate a histogram for each cluster. inputting the data file; and 

31. A system for analyzing a data file containing a processing the data file by 

plurality of customer data records, each data record contain- segmenting the data records into a plurality of segments 

ing a plurality of customer parameters, the system compris- 50 based on the parameters, 

"^S- clustering the data records into a plurality of clusters 

an input for receiving the data file; and containing data records having similar parameters, 

a data processor for processing the data records, the data and 

processor further comprising predicting expected future results fi-om the parameters 

a segmentation fimction for segmenting the customer 55 in the data records, 

data records into a plurality of segments based on the 50. The method of claim 49 wherein the inputting step 

parameters, further comprises converting the data records into a prede- 

a clustering function for clustering the customer data term in ed processing format. 

records into a plurality of customer groups having 51 . The method of claim 50 further comprising the step of 

similar parameters, and 60 converting the data records in the processing format back to 

a prediction function for predicting customer behavior their original format, 

from the customer data records. 52. The method of claim 49 further comprising the step of 

32. The system of claim 31 further comprising a data appending data files together. 

manager for manipulating the data file. 53. The method of claim 49 further comprising the step of 

33. The system of claim 32 wherein the data manager 65 merging data files together. 

further comprises a data append fimction for appending data 54. The method of claim 49 further comprising the step of 

files. paring parameters from a data file. 
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55. The method of claim 49 wherein the segmenting step 
further comprises providing statistics on the data records. 

56. The method of claim 49 wherein the segmenting step 
further comprises segmenting an existing segment into addi- 
tional segments. 

57. The method of claim 49 wherein the clustering step 
further comprises clustering the data records into groups 
having similar parameters. 

58. The method of claim 57 wherein the clustering step 
further comprises identifying characteristic profiles for each 
group. 

59. The method of claim 49 wherein the data records 
further comprise customer data records containing a plural- 
ity of customer parameters in each customer record. 

60. The method of claim 59 wherein the segmenting step 
further comprises segmenting the customer data records into 
groups of customers. 



10 



15 



34 



61. The method of claim 59 wherein the clustering step 
further comprises clustering customer data records into 
statistically significant groups of customers. 

62. The method of claim 59 wherein the predicting step 
further comprises predicting customer behavior fi-om the 
customer data records. 

63. The method of claim 59 wherein the predicting step 
further comprises predicting prospective customer behavior 
from the customer data records. 

64. The method of claim 49 wherein the segmenting step 
further comprises generating a histogram for each parameter 
in the data records, 

65. The method of claim 49 wherein the segmenting step 
further comprises generating a histogram for a data segment. 

66. The method of claim 49 wherein the clustering step 
further comprises generating a histogram for each cluster. 
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